Phase transitions for high dimensional clustering and related problems
نویسندگان
چکیده
منابع مشابه
Phase Transitions for High Dimensional Clustering and Related Problems
Consider a two-class clustering problem where we observe Xi = `iμ + Zi, Zi iid ∼ N(0, Ip), 1 ≤ i ≤ n. The feature vector μ ∈ R is unknown but is presumably sparse. The class labels `i ∈ {−1, 1} are also unknown and the main interest is to estimate them. We are interested in the statistical limits. In the two-dimensional phase space calibrating the rarity and strengths of useful features, we fin...
متن کاملClustering Problems for High Dimensional Data
We consider a clustering problem where we observe feature vectors Xi ∈ Rp, i = 1, 2, . . . , n, from several possible classes. The class labels are unknown and the main interest is to estimate these labels. We propose a three-step clustering procedure where we first evaluate the significance of each feature by the Kolmogorov-Smirnov statistic, then we select the small fraction of features for w...
متن کاملPhase Transitions and Clustering Properties of Optimization Problems
One central subject in computer science are NP-complete problems. These are problems which are hard in the sense that no fast algorithms to solve them exist. Interestingly, many practical applications belong to this class. When studying problems on suitably parametrized ensembles, one finds phenomena strongly reminiscent of phase transitions as appearing for physical systems. One also observes ...
متن کاملPhase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation
We study a spectral initialization method that serves as a key ingredient in recent work on using efficient iterative algorithms for estimating signals in nonconvex settings. Unlike previous analysis in the literature, which is restricted to the phase retrieval setting and which provides only performance bounds, we consider arbitrary generalized linear sensing models and present a precise asymp...
متن کاملDistributional Similarity, Phase Transitions and Hierarchical Clustering
We describe a method for automatically clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy is used to measure the dissimilarity of those distributions. Clusters are represented by "typical" context distributions averaged from the given words accordi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Annals of Statistics
سال: 2017
ISSN: 0090-5364
DOI: 10.1214/16-aos1522